Research on Text Categorization Based on a Weakly-Supervised Transfer Learning Method
نویسندگان
چکیده
This paper presents a weakly-supervised transfer learning based text categorization method, which does not need to tag new training documents when facing classification tasks in new area. Instead, we can take use of the already tagged documents in other domains to accomplish the automatic categorization task. By extracting linguistic information such as part-of-speech, semantic, co-occurrence of keywords, we construct a domain-adaptive transfer knowledge base. Relation experiments show that, the presented method improved the performance of text categorization on traditional corpus, and our results were only about 5% lower than the baseline on cross-domain classification tasks. And thus we demonstrate the effectiveness of our method.
منابع مشابه
Semi-supervised Learning with Weakly-Related Unlabeled Data: Towards Better Text Categorization
The cluster assumption is exploited by most semi-supervised learning (SSL) methods. However, if the unlabeled data is merely weakly related to the target classes, it becomes questionable whether driving the decision boundary to the low density regions of the unlabeled data will help the classification. In such case, the cluster assumption may not be valid; and consequently how to leverage this ...
متن کاملImproving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملHarnessing the Expertise of 70, 000 Human Editors: Knowledge-Based Feature Generation for Text Categorization
Most existing methods for text categorization employ induction algorithms that use the words appearing in the training documents as features. While they perform well in many categorization tasks, these methods are inherently limited when faced with more complicated tasks where external knowledge is essential. Recently, there have been efforts to augment these basic features with external knowle...
متن کاملSemi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding
This paper presents a new semi-supervised framework with convolutional neural networks (CNNs) for text categorization. Unlike the previous approaches that rely on word embeddings, our method learns embeddings of small text regions from unlabeled data for integration into a supervised CNN. The proposed scheme for embedding learning is based on the idea of two-view semi-supervised learning, which...
متن کاملRestricted Transfer Learning for Text Categorization
In practice, machine learning systems deal with multiple datasets over time. When the feature spaces between these datasets overlap, it is possible to transfer information from one task to another. Typically in transfer learning, all labeled data from a source task is saved to be applied to a new target task thereby raising concerns of privacy, memory and scaling. To ameliorate such concerns, w...
متن کامل